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ABSTRACT 

This  paper  describes  a  system  for  understanding  English  definitions  of  software 
modules  and  generating  formal  specifications.  The  design  of  the  system  itself  is 
emphasized,  particularly 

-  the  choice  of  a  target  specification  language, 

-  selection  of  a  parsing  strategy,  and 

-  treatment  of  semantic  problems,  such  as  understanding  spatial  metaphor  and 
interpreting  known  words  in  new  envirorvnents. 


1.  INTRODUCTION 

A  cornerstone  of  the  design  of  Ivge  software  systems  is  the  definition  of  their 
modules  based  on  the  information-hiding  principle  [Parnas  72].  Given  that  principle,  one 
should  define  a  module  interface  without  revealing  the  module's  (internal)  implementation. 

One  could  define  a  module  interface  using  a  formal  language  such  as 
SPECIAL  [Roubine  76]  or  AFFIRM  [Guttag  78,  Musser  79]  or  using  natural  language. 
Formal  specifications  of  modules  offer  many  advantages  for  design  of  large  software 
systems,  including  lack  of  ambiguity,  precision,  attention  to  detail,  mechanical  processing, 
and  appropriateness  for  both  proof  techniques  and  transformations.  Neveriheless,  few 
would  argue  that  they  are  either  easy  to  create  or  easy  to  understand  once  created. 

[Balzer  78]  argues  that  some  a^cts  of  informality  are  actually  desirable  in 
specifying  software  modules.  Futhermore,  [Hobbs  77]  cites  several  aspects  of  natural 
langauge  semantics  which  are  preferable  to  existing  formal  languages. 

This  paper  investigates  an  artificial  intelligence  approach  to  combining  the 
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advantages  of  both  formal  and  natural  languages.  The  long-term  goal  is  a  system  which 
could  take  as  input  an  English  definition  of  a  module,  and  generate  an  equivalent  formal 
specification.  In  additioa  the  system  should  generate  an  English  paraphrase  of  its 
understanding  of  the  input  so  that  the  user^  nwy  easily  check  the  system's  understanding 

The  remainder  of  the  paper  describes  the  design  decisions  made  in  implementing 
a  prototype  to  understand  English  texts  defining  data  structures.  Section  2  enumerates 
some  of  the  reasons  we  feel  are  most  important  for  using  natural  language.  Section  3 
defines  the  target  specification  language  and  the  nwtivation  in  selecting  it  Section  4 
relates  our  experience  in  using  a  parser  for  texts  defining  data  structures.  Section  5  deals 
with  semantic  issues  such  as  interpreting  spatial  metaphors  and  selecting  precise  translations 
of  vague  English  terms.  Related  work  and  our  conclusions  are  presented  in  sections  6  and 


2.  THE  MOTIVATION  FOR  NATURAL  LANGUAGE 

It  is  our  contention  that  a  key  to  human  understanding  is  having  a  model  of  the 
concept  to  be  understood.  For  complex  concepts,  the  model  may  not  be  very  precise,  but 
does  provide  a  framework  within  which  detail  can  be  organized  Present  formal  languages 
are  weak  at  expressing  such  models,  unless  they  are  stated  in  natural  language  as 
comments  or  other  accompanying  explanations. 

If  this  contention  is  true,  we  should  see  numerous  examples  of  English  linguistic 
expressions  used  to  convey  a  view  of  a  software  module.  In  examining  simple  definitions 
of  data  structures,  we  indeed  find  that  For  example,  spatial  metaphors  are  prevalent  in 
data  structure  descriptions,  even  though  the  spatial  metaphor  has  little  to  do  with  the  actual 
physical  realization  of  any  implementation.  For  instance,  one  speaks  of  the  ends  of  e  fist, 
the  top  of  a  stock,  running  doe/n  a  list,  following  a  pointer,  etc.  Such  expressions  are 
simple  metaphors  in  that  they  provide  a  spatial  view  which  is  independent  of  software  or 
hardware  existing  in  three  dimensions. 

Analogy  also  supports  tNs  contention,  since  analogy  describes  one  concept  in 
terms  of  another,  and  since  it  is  intended  at  a  general,  rathsr  than  detailed,  level  For 
instance,  one  finds  definitions  such  as  A  stock  Is  a  queue  in  which  oil  insertions  end 
deletions  occur  ot  one  end  in  [Horowitz  76].  Similarly,  in  the  English  definition  of  the 
design  of  the  kernel  of  a  secure  operating  system  [Ford  78],  one  finds  the  argument  to  a 
system  call  defined  as  the  sticky  hit  of  UNiX^. 

Yet  another  instance  is  the  use  of  vague  terms  to  connote  certain  impressions. 
In  defining  a  queue  in  terms  of  an  ordered  list  [Horowitz  76]  defines  the  enqueue 
operation  as  odding  on  element  i  to  the  reor  of  a  queue,  even  though  no  operation  of 

uMr  of  this  Al  systsm  is  spscifying  a  now  systsm.  Hs/sho  is  axport  in  tho  application  area. 

^UMX  is  a  tradomark  of  Boll  Laboratorios. 


adding  has  bsen  defined  for  ordered  lists.  Rather  a  notion  of  adding  something  to  a 
sequence  is  presumed,  as  well  as  the  ability  to  generalize  the  notion  to  the  new  data 
structure. 

If  our  contention  is  correct  one  should  see  concepts  introduced  specifically  to 
summarize  information.  For  instance,  in  the  English  description  of  a  provably  secure 
operating  system  [Ford  78],  one  finds  the  statement  A  SE/D  is  returned  as  the  resuit  of 
aii  new  object  creations  (KjMJiidjsegment,  ...).  This  introduces  the  concept  of  interface 
operations  that  create  new  objects;  it  uses  that  concept  to  summarize  the  fact  that  six 
operations  are  somewhat  similar  in  purpose  tfid  return  the  same  class  of  value. 

One  could  argue  that  this  evictence  is  not  compelling,  that  all  that  is  needed  is 
richer  formal  language  semantics.  For  instance,  one  could  try  to  have  a  rigorous,  precise 
defNiition  of  analogy  incorporated  in  formal  languages  Yet  the  prevalence  and  value  of 
mnemonics  for  procedure  names,  arguments,  constants,  etc.  is  strongly  suggestive  that  no 
matter  what  the  formal  language,  a  critical  part  of  the  understanding  of  formal 
specifications  will  depend  on  providing  an  individual  with  intuitive  concepts  as  a  framework 
for  organizing  detail.  If  purely  formal  languages  were  adequate,  then  mathematics  after  all 
of  these  centuries  would  be  stated  exclusively  in  formalisms. 

Based  on  our  contention  that  a  key  to  human  understanding  is  having  a  model  of 
the  concept  to  be  understood,  natural  langauge  has  many  desirable  qualities  for 
specification.  Other  motivations  are  presented  in  [Balzer  78]  and  [Hobbs  77]. 

3.  SELECTION  OF  TARGET  LANGAUGE  AND  KNOWLEDGE 
REPRESENTATION  LANGUAGE 

The  target  formal  specification  lanjpjage  selected  is  Horn  clauses.  A  Horn  clause 
has  the  form 

C  IF  A1  &  A2  &  ._  &  An 

where  n>sO,  C  is  an  atomic  formula,  and  each  Ai  is  an  atomic  fornHJla^  A  Horn  clause 
therefore  is  a  restricted  form  of  formula  in  first  order  logic  Since  all  variables  are  free, 
all  variables  are  implicitly  universally  quantified 

A  set  of  Horn  clauses  stating  axioms  about  the  operations  at  the  interface  of  a 
software  nxxiule  can  serve  as  a  specification  of  the  module  A  primary  reason  for 
selecting  Horn  clauses  as  the  target  langauge  is  that  they  are  also  highly  appropriate  as  a 
knowledge  repreaentation  language  for  artificial  intelligence.  Furthermore,  Horn  clause 
theorem  provers  [Chester  80]  are  one  approach  to  modelling  reasoning  Consequently, 
once  a  user's  Biglish  specification  has  been  understood,  adding  the  set  of  axioms  to  the 
knowledge  base  of  the  system  offers  two  potentials: 


^An  ftemie  formula  it  a  pradicata  appliad  to  tarma.  A  tarm  is  a  constant  a  variabla,  or  a  function  applied 


-  the  system  can  grow  in  competence. 

-  the  user  may  define  new  entities  in  terms  of  modules  defined  earlier,  and 

-  the  user  may  run  the  theorem  prover  to  test  the  correctness  of  the 
specification  or  to  prove  properties  of  the  specification 

As  an  example,  consider  the  Horn  clauses  below.  All  predicates  (including 
equalities  and  inequalites)  are  expressed  in  a  LlSP-like  prefix  notation;  variables  are  prefixed 

by  an  underscore:  and  ( _ X  . _ Y}  is  an  abbreviation  for  a  term  (f _ X _ Y)  expressing  the 

result  of  adding _ X  at  the  front  of _ Y. 

1.  (>=  (LENGTH  _S)  0)  IF  (TUPLE  _S) 

2.  (TUPLE  _S)  IF  (NULL  _S) 

3.  (TUPLE  (_X  .  _YM  F  (TUPLE  _Y)  &  (CONSISTENT-TYPE  (_X  .  _Y)  __T) 

They  correspond  to  the  following  facts  ^>out  tuples  respectively: 

1.  A  tuple  has  a  nonnegative  length. 

2  The  null  tuple  is  a  tuple. 

3.  A  form  whose  first  element  is _ X  is  a  tuple  if  the  rest  of  the  form  is  and 

if  the  form  has  a  consistent  type _ ^T. 

From  the  first  Horn  clause,  we  notice  that  facts  that  could  be  expressed  with  existentially 
quantified  variables  will  involve  Skolem  functions  rather  than  existential  quantification. 

Formal  specification  languages  are  a  topic  of  research.  It  is  possible  that 
Gist  [Balzer  82]  or  Tecton  [Kapur  82]  will  lead  to  languages  more  appropriate  to  our 
task.  Both  are  attempting  to  incorporate  semantics  closer  to  that  of  natural  language  as 
used  in  informal  specifications  If  auccessful,  the  understanding  task  may  be  somewhat 
simplified,  though  traditional  problems  in  natural  language  such  as  ambiguity,  vagueness, 
reference  resolution,  metaphor,  etc.  will  still  exist 

4.  PARSING  AND  SEMANTIC  INTERPRETATION 

Given  the  selected  target  language,  there  are  several  alternatives  for  the  style  of 
grammar  and  of  semantic  interpretation.  For  instance,  rather  than  accounting  for  all  words 
in  a  text  one  could  use  a  strategy  of  processing  only  phrases  that  are  relevant  to  a 
domain-dependent  schema  of  stereotypical  concepts  [Schank  80].  This  strategy  seems 
inappropriate  for  the  application  of  module  specification,  since  in  defining  a  non- 
stereotypieal  structure  any  phrase  skipped  may  be  critical  to  the  definitioa 

Another  deciaion  is  whether  to  select  a  general-purpose  grammar  having  broad 
coverage  of  Biglish  and  then  to  add  specM  phrases,  constructions,  and  lexical  items 
peculiar  to  the  application.  The  advantage  of  this  of  course  is  the  potential  of  using 
someone  else's  grammar  rather  than  creating  one  tailored  to  the  domain.  Another 
altamative  is  to  build  one  specific  to  the  application:  semantic  grammars  [Burton  76] 
exemplify  tNe  A  semantie  granvnar  has  nonterminais  apecific  to  aemantic  classes  of  the 


domain,  -aueh  as  data  structure  phrases,  rather  than  syntactic  constituents,  such  as  noun 
phrases.  Because  of  tight  coupling  between  a  phrase  and  a  semantic  category,  senseless 
interpretations  can  be  rapidly  discarded 

We  chose  RUS  [Bobrow  78],  a  broad  coverage  grammar  of  English  which 
performs  semantic  interpretaion  incrementally.  For  each  constitutent  y  found  and  proposed 
as  part  of  a  constituent  x,  the  grammar  calls  the  semantic  component  to  extend  the 
semantic  representation  of  x  based  on  finding  y.  If  /s  interpretation  is  inconsistent  with 
semantic  constraints  on  adding  it  to  x,  then  the  parser  abandons  this  parse  When  the 
grammar  proposes  that  a  constituent  is  complete,  the  semantic  component  receives  a 
message.  Either  it  returns  a  semantic  interpretation  for  x  or  it  vetoes  the  proposed  parse 
of  X. 

The  semantic  component  therefore  must  be  prepared  to  build  structures 
incrementally,  rather  than  waiting  till  the  end  of  a  constituent  x  before  applying  semantic 
constraints  on  it  The  constraints  encoded  in  our  semantic  interpreter  are  organized  in  case 
frames,  a  vary  common  style  of  encoding  selection  restrictions.  Each  phrase  has  a  head 
lexical  item,  e.g  verbs  for  typical  clauses,  common  nouns  for  typical  noun  phrases,  and 
prepositions  for  prepostional  phrases.  Any  lexical  item  that  can  serve  as  a  head  may  have 
several  case  frames  associated  with  it  one  frame  per  sense  of  the  head  word  A  frame 
is  a  list  of  possible  phrase  slots  that  may  be  associated  with  a  word  sense;  for  each  slot 
a  semantic  constraint  is  associated,  limiting  the  kind  of  entity  that  can  fill  the  slot  For 
instance,  delete  has  three  slots.  The  logical  subject  must  be  a  program  or  person;  the 
logical  object  must  be  a  data  entity;  a  pr^sositional  phrase  \whose  head  is  from  must  have 
an  object  which  is  a  data  structure.  That  is,  one  sense  of  delete  is  that  a  person  or  a 
program  may  delete  a  data  entity  from  a  data  structure.  In  addition,  a  slot  may  be  marked 
as  optional  or  mandatory.  Each  case  frame  also  includes  a  structure-building  operation, 
stating  what  logical  expression  is  to  be  built  for  this  form 

Processing  within  the  semantic  corr^x>nent  falls  into  three  cases 

-  As  soon  as  the  proposed  head  of  a  phrase  x  is  found,  all  case  frames  are 
retrieved  as  possible  word  senses 

-  As  a  phrase  y  is  proposed  to  fill  a  given  slot  in  x,  the  semantic  constraint  of 
that  slot  is  tested  on  y,  potentially  eliminating  some  of  the  case  frames  from 
further  consideration.  If  none  remain,  y  cannot  be  added  to  x. 

-  When  the  parser  proposes  that  x  is  complete,  the  semantic  component 
eliminates  any  case  frame  with  unfilled  mandatory  slots  and  also  builds  the 
semantic  representation  for  any  remaining  case  frame. 

In  using  this  approach  three  aspects  of  our  experience  are  interesting  First  the 
major  modification  to  the  RUS  grammar  was  to  allow  mathematical  notation,  so  that  one 
can  use  it  freely  within  English  Thus,  a  text  such  as  the  following  can  be  parsed  by  the 
modified  grammar^  (The  sentences  have  been  numbered  for  expository  purposes.) 

is  ■  modifisci  wtrsion  of  a  dofinition  givsn  on  pagss  41-42  of  [Horowita  76]. 


1.  We  say  that  an  ordered  list  is  empty  or  it  can  be  written  as  (A[1],  AC2], 
-.A[N])  where  the  ACt]  are  atoms  from  some  set  S. 

Z  There  are  a  variety  of  operations  that  are  performed  on  these  lists. 

3.  These  operations  include  the  following 

4.  Find  the  length  N  of  the  list 

5.  Retrieve  the  ith  element  1<si<=N. 

6.  Store  a  new  value  at  the  ith  position,  1<=I<=ISL 

7.  Insert  a  new  element  at  postion  I,  1<N<=N-t-1  causing  elements  numbered 

1,1+1.  _,N  to  become  nunbered  1+1.  1+2.  N+1. 

8.  Delete  the  element  at  postion  I,  l<3i<=N  causing  elements  numbered  1+1.  ....  N 

to  become  numbered  I,  1+1,  N-1. 

The  modifications  were  easy  to  make,  for  the  patterns  with  which  the  mathematical 
expressions  occur  fit  naturally  into  the  grammv  of  English. 

Second,  becuase  of  a  broad-coverage  grammar,  adding  new  texts  requires 
proportionally  little  time  on  the  syntactic  aspects.  Some  new  dictionary  entries  are 
required,  and  very  infrequently  a  new  construction  must  be  added.  Therefore,  one  can 
concentrate  on  the  semantic  issues,  which  dominate  the  effort  in  extending  the  system 

.  Third,  PUS'S  calls  to  the  semantic  cohiponent  eliminates  many  senseless 
interpretations.  For  the  text  above,  the  first  interpretation  found  by  the  parser/semantic 
component  was  the  correct  one  in  all  but  one  sentence.  Furthermore,  five  of  the 
sentences  yielded  only  one  interpretation;  the  other  three  yielded  only  two.  As  the 
semantic  component  is  expanded  to  broader  and  broader  domains,  the  case  franf>e 
constraints  will  be  somewhat  less  effective.  For  instance,  one  could  expect  that  there 
would  be  5  interpretations  for  the  first  serttence,  only  one  for  the  next  four,  and  two  for 
the  last  three  in  broader  environments  Nevertheless,  this  is  radically  less  than  the  number 
of  interpretations  if  no  selection  restrictions  vm'e  applied  during  parsing 

Based  on  these  observations,  we  feel  the  time  is  ripe  to  adopt  broad  coverage 
grammars  of  English  which  interact  with  semantic  components  to  prune  senseless  parses. 
The  tftemative  of  writing  one's  own  grammar  requires  substantial  time,  which  could  be 
devoted  to  other  purposes. 

5.  ADDITIONAL  SEMANTIC  PROBLEMS 

Semantic  interpretation,  definite  reference  reaolution,  and  quantifier  scope 
decisions,  are  well-known  semantic  problems  of  natural  langauge  understanding  Yet  even 
after  a  ayetem  has  generated  a  semantic  representation  R  where  such  decisions  have  been 
made,  there  may  still  be  a  need  for  further  transformation  and  understanding  of  the  input 
to  generate  a  representation  S  for  the  underlying  application  system  There  are  at  least 
three  reasons  for  tMs. 


7 


First  consider  spatial  metaphor.  Understanding  spatial  metaphor  seems  to  require 
computing  some  concrete  interpretation  S  for  the  metaphor;  however,  understanding  the 
metaphor  concretely  may  be  attempted  after  computing  a  semantic  representation  R  that 
represents  the  spatial  metaphor  formally  but  without  full  understanding.  Generating  an 
English  paraphrase  of  the  system-generated  formal  specification  to  allow  the  user  to  check 
the  system's  undersanding  is  likely  to  be  both  easier  and  more  understandable  to  the  user 
if  the  user's  terminology  is  employed.  By  having  an  intermediate  level  of  understanding 
such  as  R,  and  gwierating  English  output  from  it,  one  may  not  have  to  recreate  the 
metaphor,  for  the  terms  in  R  use  it  as  a  primitive. 

Second,  the  needs  of  the  imderiying  application  system  may  dictate 
transformations  that  are  neither  essential  to  understanding  the  English  text  nor  linguisticly 
motivated.  In  a  data  base  environment  transformations  of  the  semantic  representation  may 
yield  a  retrieval  request  that  is  computation^ly  less  demanding  [King  80].  To  promote 
portability,  EUFID  [Templeton  83]  and  TQA  [Damerau  81]  are  interfaces  that  have  a 
separate  component  for  transformations  specific  to  the  data  base.  In  software 
specification,  mapping  of  the  semantic  representation  R  may  yield  a  form  S  vt/hich  is  more 
amenable  for  proving  theorems  about  the  specification  or  for  rewriting  it  into  some 
standard  form 

The  following  example,  derived  from  a  definition  of  stacks  on  page  77  of 
[Horowitz  76]  illustrates  these  first  two  reasons.  A  stack  is  an  ordered  list  in  which 
all  Insertiorts  and  deletions  occur  at  one  end  called  the  top.  A  theorem  prover  for 
abstract  data  types  would  normally  assume  that  the  end  of  the  stack  in  question  is  referred 
to  by  a  notation  such  as  A[1]  if  A  is  the  name  of  the  stack,  rather  than  understanding  the 
spatial  metaphor  "one  end". 

Third,  it  may  be  convenient  to  design  the  transformation  process  in  two  phases, 
where  the  output  of  both  phases  is  a  semantic  representation.  In  our  system,  we  have 
chosen  to  map  certain  paraphrases  into  a  common  form  via  a  two  step  process.  The 
forms  "ith  element”  and  "element  i"  each  generate  the  same  term  as  a  result  of  semantic 
interpretatioa  However,  the  semantic  interpreter  generates  another  term  for  "element  at 
position  i"  due  to  the  extra  lexical  iten«  "at"  and  "position".  Obviously,  all  three 
expressions  correspond  to  one  concept  The  system  must  recognize  that  the  two  terms 
generated  by  the  semantic  interpreter  are  paraphrases  and  map  them  into  one  form. 

In  our  system,  the  semantic  representation  R  is  in  the  form  of  Horn  clauses.  All 
sentantic  interpretation,  quantifier  scope  decisions,  and  reference  resolution  has  been 
performed  prior  to  this  second  translation  phase  which  is  performed  by  the  mapping 
component.  Input  to  the  mapping  component  for  the  text  defining  ordered  lists  is  given  in 
the  appendix. 

The  rules  of  the  mapping  component  are  all  encoded  as  Horn  clauses.  The 
antecedent  atomic  formulas  of  our  rules  specify  either 

1.  the  structural  change  to  be  made  in  the  collection  of  formulas  or 
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2.  conditions  wNch  sre  not  structural  in  nature  but  which  must  be  true  if  the 
mapping  is  to  apply. 

We  will  use  the  notation  (MAPPING-RULE  (al  ~  am)  x  (cl  ...  ck)  y)  to  mean  that  the 

atomic  formulas  al  _  am  must  be  present  in  the  list x  of  atomic  formulas;  the  list x 

of  formulas  is  assumed  to  be  implicitly  conjoined  The  variable _ ^  will  be  bound  to  the 

result  of  replacing  the  formulas  al.  am  in _ ^x  with  the  formulas  c1,  ck.  There  is  a 

map  between  two  lists, _ x  and _ ^y,  of  atomic  formulas  if  (MAP _ x _ ^y)  is  true. 

The  two  examples  given  earlier  are  detnied  next  For  expository  purposes  the 
rules  given  in  this  section  have  been  simplified 

Consider  the  following  example:  A  stack  is  an  ordered  list  in  which  all 
insertions  and  deletions  occur  at  one  end  called  the  top.  AUDiiJS)  adds  item  i  to  stack 
S.  In  this  environment  spatial  metaphors  tend  to  be  more  frozen  than  creative.  To 
understand  ‘one  end*,  we  assume  the  follovWng  rules: 

1.  For  a  sequence _ ^D,  we  may  map  " _ is  an  end  of _ ^D“  to  " _ ^E  is  the  first 

sequence  element  of _ ^D*. 

2.  An  ordered  list  is  a  sequenca 

Facts  (1)  and  (2)  are  encoded  as  Horn  clauses  below. 

(MAP  _X  _Y)  F  (MAPPING-RULE  ((END  _E  _D))  _X 

((SEQUBMCE-aEMBMT  _E  1  _D))  _Y)  & 

(SEQUENCE  __D) 

(SEQUBMCE  _D)  IF  (ORDB?H)-LIST  _D) 

The  system  knows  how  to  map  the  notion  of  "end  of  a  sequence",  and  it  knows  tttat 
ordered  lists  are  sequencea  Since  the  first  smtence  is  discussing  the  end  of  an  ordered 
list  the  two  rules  above  are  sufficient  to  map  "end”  into  the  appropriate  concrete  semantic 
representation.  The  power  and  generality  of  this  approach  is  that 

-  a  chain  of  reasoning  may  show  how  to  view  some  entity _ D  as  a  sequence 

(and  therfore  the  rules  show  how  to  interpret  "end  of _ ^D"),  and 

-  other  mapping  rules  may  state  how  to  interpret  spatial  metaphors  unrelated  to 
"end”  or  to  sequencee 

We  propose  that  the  same  mechanism  can  deal  with  certain  vague,  extended 
uses  of  words,  such  as  add  in  the  previous  example.  In  stating  that  ADD(I.S)  adds  item  I 
to  stack  S,  add  cannot  be  predefined,  since  its  meaning  is  being  defined  for  stacks. 
Nevertheless,  it  is  reasonable  to  assume  that  there  is  a  general  relation  between  "add”  and 
related  concepts  such  as  uniting,  including,  or,  in  the  data  structure  environments,  inserting 
Consequently,  we  propose  the  following  fact  fci  addition  to  the  two  above: 

-  For  a  sequence  __S.  we  may  map  "add  _ I  to  _ S"  to  "insert  _ ^1  at  sonte 

portion _ X  of  _S". 

It  may  be  stated  formally  as 
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(MAP  _W  _Z)  IF  (MAPPING-RULE  ((ADD  _l  __S))  __W  ((INSERT  _l  _S  _X))  _Z) 

&  (SEQUENCE  __S) 

Notice  that  _ X  wilt  be  unbound  However,  the  Horn  clauses  generated  for  the  first 

sentence  {A  stack  is  an  ordered  list  in  which  aii  insertions  and  deletions  occur  at  one 

end  called  the  top)  will  imply  that X  is  the  position  corresponding  to  the  end  called  top. 

Therefore,  the  vague,  extended  use  of  "add"  can  be  understood  using  the  inference 
mechanism  of  the  mapping  component  Other  rules  may  state  how  to  interpret  an 
extended  use  of  add  by  relating  it  to  views  other  than  sequences. 

Another  problem  involves  mapping  the  forms  ‘ith  element',  "element  i",  and 
"element  at  position  i*  into  the  same  representation  Assume  that  the  semantic  interpreter 

generates  for  each  of  the  first  two  the  list  of  fonnulas  ((ELEMENT  _ X)  (IDENTIFIED-BY 

_ X _ Y)).  The  Horn  clause  for  that  mapping  is  as  follows: 

(MAP  __W  _Z)  IF  (TOPIC  _T)  &  (SEQUENCE  _T)  & 

(MAPPING-RULE  ((ELEMENT  _XJ  (IDENTIFIED-BY  _X  _Y))  _W 
((SEQUENCE-ELEMENT  __X  _Y  _T))  _Z) 

Note  that  this  rule  assumes  that  in  context  some  sequence T  has  been  identified  as  the 

topic;  the  rule  identifies  that  the  element  X  is  the  Yth  member  of  the  sequence  ^T. 

For  the  phrase  "element  at  position  i",  assume  the  semantic  interpreter  generates  the  list  of 
formulas  ((ELEMENT  _X)  (AT  _X  (POSITION  _P))  (IDENTIFIED-BY.  _P  _Y)).  The  mapping 
rule  for  It  is  similar  to  the  one  above. 

(MAP  _W  _Z)  IF  (TOPIC  _T)  &  (SEQUENCE  _T)  & 

(MAPPING-RULE 

((ELEMENT  _X|  (AT  __X  (POSITION  _P))  (IDENTIFIED-BY  _P  _Y)) 

__W  ((SEQUENCE-ELEMENT  __X  _Y  _T))  _Z) 

This  secoTKl  rule  must  be  tried  before  the  iM'ior  one. 

The  mapper  halts  when  no  more  rules  can  be  applied 

6.  RELATED  WORK 

A  number  of  applied  Al  systems  have  been  developed  to  support  automating 
software  construction  [Balzer  78,  Green  76,  Biermann  80,  Gomez  82].  Of  these,  our 
effort  is  the  only  one  that  has  focussed  on  the  linguistic  issues  in  the  mapping  problem.  It 
is  also  disthiguiahed  by  our  design  decisions  regarding  the  target  langauge  and 
parsing/semantic  interpretation.  The  systems  in  [Green  76,  Biermann  80.  Gomez  82]  were 
designed  for  generating  algorithms  from  English  input  in  algorithm  generation,  efficiency 
of  the  algorithm  generated  is  of  critical  concern.  This  problem  is  not  critical  in  module 
specification,  since  the  specification  forms  a  contract  stating  what  programs  implementing 
the  specification  must  do. 
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Viewing  spatial  metaphors  in  terms  of  a  scale  was  proposed  in  [Hobbs  77], 
Our  model  is  somewhat  more  general  in  that  the  inference  process 

-  permits  specific  constraints  for  each  metaphor,  not  just  the  one  view  of  a 
scale,  and 

-  accounts  for  other  mapping  problems  in  addition  to  spatial  metaphor. 

A  very  similar  approach  to  mapping  has  been  proposed  in  [Mark  80].  Instead 
of  using  Horn  clauses  as  the  formalism  for  mapping,  they  encode  their  rules  in  KL-ONE 
[Brachman  78].  The  concern  in  [Mark  80]  is  inferring  the  appropriate  service  to 
perform  in  response  to  a  user  request,  rather  than  demonstrating  means  of  interpreting 
spatial  metaphors  or  of  finding  contextually  dependent  paraphrases. 

The  value  of  generating  a  paraphrase  for  a  formal  specification  has  been 
discussed  in  [Swartout  82].  Language  generation  is  a  very  active  area  of  research;  an 
overview  of  the  sate  of  the  art  is  provided  in  [Mann  81].  No  generation  component  has 
been  included  in  our  prototype  system 


7.  CONCLUSIONS 

The  design  of  a  system  to  generate  formal  specifications  from  natural  language 
definitions  is  a  long  term  research  goal.  The  availability  of  broad-coverage  grammars 
[Bobrow  78,  Robinson  82,  Sager  81]  that  use  selection  restrictions  while  parsing  to 
eliminate  anomolous  pvses  is  an  important  step  toward  that  There  are  five  broad  areas 
for  future  work: 

-  formal  specification  languages  with  richer  semantics  so  that  the  level  of  the 
target  language  is  closer  to  that  of  natural  languages, 

-  development  of  more  flexible,  forgiving  natural  language  interfaces 

[Weischedel  83]  that  have  partial  understanding  even  of  poorly  formed 
input, 

-  extension  of  the  technology  to  broad  areas  of  specification, 

-  development  of  high  quality  English  generation  components  both  for  creating  a 
paraphrase  of  the  formal  specification  and  for  generating  questions  to  clarify 
ambiguous  or  vague  aspects  of  the  English  definitions,  and 

-  further  development  of  the  mapping  phase. 

There  are  several  reasons  why  one  may  want  such  a  mapping  phase  even  after  a 
semantic  representation  for  an  utterance  has  been  computed  The  advantage  of  using  Horn 
clauses  (or  any  other  deduction  mechanism)  in  this  mapping  phase  is  the  ability  to  include 
nonstructural  conditions.  This  means  that  the  mapping  rules  may  be  based  on  reasoning 
about  context 

There  are  three  areas  for  further  developnwnt  of  the  nupping  phase: 

-  generating  mapping  rules  based  on  additional  texts. 
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-  investigating  use  of  the  mapping  component  in  reference  resoiution,  and 

-  deveioping  an  indexing  technique  to  run  the  mapper  in  a  forward  chaining 
mode 


APPENDIX 

We  include  here  the  actual  Horn  clauses  that  serve  as  the  output  of  the 
semantic  component  and  as  the  input  to  the  mapping  component  The  English  that 
generated  the  Horn  clauses  is  provided  for  reference  in  italics;  it  is  not  supplied  as  input 
to  the  mapping  component  Ampersands  have  been  inserted  for  expository  purposes.  For 
the  first  sentence,  there  is  no  easy  way  to  convert  the  disjunction  to  a  Horn  clause. 
Therefore,  we  generate  an  extended  notation  allowing  disjunction  for  that  case. 


We  say  that  an  ordered  list  is  empty  or  it  can  be  written  as 
where  the  A[/]  are  atoms  from  son)e  set  S. 

UOR  (((EMPTY  A23)  IF  (LIST  A23)  &  (ORDER  NIL  A23))) 

(((EQUIV  IA0031  A23)  (TUPLE  (SUBSCRIPT  A  1)  ELLIPSIS  (SUBSCRIPT  A  N))) 
IF  (LIST  A23)  &  (ORDER  NIL  A23» 

((NOTATION  NIL  A23  (A0031  A23))  IF  (LIST  A23)  &  (ORDER  NIL  A23)) 
((SET  (A0032  A23))  IF  (EQUIV  A71  (SUBSCRIPT  A  I))  &  (LIST  A23) 

&  (ORDER  NIL  A23» 

((IDENTIFIED-BY  NIL  (A0032  A23)  S) 

IF  (EQUIV  A71  (SUBSCRIPT  A  I))  &  (LIST  A23)  &  (ORDER  NIL  A23)) 
((MEMBERS-OF  A71  (A0032  A23)) 

IF  (EQUIV  A71  (SUBSCRIPT  A  I))  &  (LIST  A23)  &  (QRDER  NIL  A23))») 


There  are  a  variety  of  operations  that  are  performed  on  these  lists. 
(((VARIETY  (A0033  A23)) 

IF  (OPERATION  A29)  &  (PERFORM  NIL  A29  A23)  &  (LIST  A23)  &  (ORDER  NIL  A23)) 
((MEMBERS-OF  A2g  (A0033  A23)) 

IF  (OPERATION  A29)  &  (PERFORM  NIL  A29  A23)  &  (LIST  A23)  &  (ORDER  NIL  A23))) 


These  operations  include  the  following. 

(((INaUDE  A 16  A340)  IF  FOLLOW  A340)  & 

FQUIV  A 16  (SETOF  A0034 

(AND  (OPERATION  A0034)  PERFORM  NIL  A0034  A23))H 
&  (LIST  A23)  &  (ORDER  NIL  A23))) 


Find  the  length,  N,  of  the  list. 

(((EQUIV  (A0037  A23)  N)  IF  (LIST  A23)  &  (ORDER  NIL  A23)) 

((LENGTH  (A0038  A23)  A23)  IF  (LIST  A23)  &  (ORDER  NIL  A23)) 

(((EQUIV  (A0038  A23)  (A0037  A23»)  IF  (LIST  A23)  &  (ORDER  NIL  A23)) 
(FOLLOW  FIND  NIL  (A0038  A23»)  IF  (LIST  A23)  &  (ORDER  NIL  A23))) 


Retrieve  the  ith  element,  t<-t<-N. 

(((LE  1  I)  IF  (ELEMENT  A22)  &  (IDENTIFIB3-BY  NIL  A22  I)) 

((LE  I  N)  IF  (ELEMENT  A22)  &  (IDENTIFITO-BY  NIL  A22  1)1 
(FOaOW  (RETRIEVE-FROM  NIL  A22  NIL))  IF  (ELEMENT  A22)  & 


(IDENTIFIED-BY  NIL  A22  I))) 


Store  a  new  value  into  the  ith  position,  1<-1<=N. 

(((LE  1  I)  IF  (POSITION  A33)  &  (IDENTIFIEO-BY  NIL  A33  I)  & 

(VALLE  A15)  &  (NEW  A15» 

((LE  I  N)  IF  (POSITION  A33)  &  (IDENTIFIEO-BY  NIL  A33  I)  & 

(VALLE  A15)  &  (NEW  A15» 

((FOLLOW  (STORE  NIL  A15  ONTO  A33))) 

IF  (POSITION  A33)  &  (IDENTIFIB3-BY  NIL  A33  I)  &  (VALLE  A 15)  &  (NEW  A 15))) 


Insert  a  new  element  at  position  I,  1<=1<-N+1  causing  elements  numbered 
l,l*1,...,N  to  become  numbered  l*1,t*2,...,N*1. 

(((POSITION  (A0062  A 18))  IF  ELEMBMT  A 18)  &  (NEW  A 18)) 

(ODENTIFIED-BY  NIL  (A0062  A 18)  I)  IF  ELEMENT  A 18)  &  (NEW  A 18)) 

((LE  1  I)  IF  ELEMENT  A18)  &  (NEW  A18)) 

((LE  I  (PLUS  N  D)  IF  ELEMENT  A18)  &  (NEW  A18)) 

(EOaOW  (INSERT  NL  A18  NIL  (AT  (A0062  A18)))) 

IF  ELEMENT  A18)  &  (NEW  A18)) 

(OTEM-OF  (A0063  A54  A 18) 

NIL 

(SEQUBMCE  ELUS  I  1)  ELUS  I  2)  ELLIPSIS  ELUS  N  1))) 

IF  ELBYENT  A54)  &  (ITEM-OF  A62  NIL  (SEQUENCE  I  (PLUS  I  1)  ELLIPSIS  N)) 

&  (IDENTIFIB3-BY  NIL  A54  A62)  &  (NUMBER  A62)  &  ELEMENT  A 18)  &  (NEW  A 18)) 
((CAUSE  (INSERT  NIL  A18  NIL  (AT  (A0062  A 18))) 

(COME-ABOUT  (AND  (IDENTIFED-BY  NIL  A54  (A0063  A54  A 18)) 

(NUMBER  (A0063  A&4  A 18))))) 

IF  ELEMENT  A54)  &  (ITEM-OF  A62  NIL  (SEQUENCE  I  ELUS  I  1)  ELLIPSIS  N)) 

&  (lOENTFIED-BY  NIL  A54  A62)  &  (NUMBER  A62)  8t  ELEMENT  A 18)  &  (NEW  A 18))) 


Delete  the  element  at  position  I,  1<^i<.-N  causing  eiements  numbered 
l*1,...,N  to  become  numbered 

((EOSmON  (A0076  A 17))  IF  ELEMENT  A17)  &  (AT  A17  A27)) 

(ODENTIFIED-BY  NIL  (A0076  A 17)  I)  IF  (ELEMENT  A17)  &  (AT  A17  A27)) 

((LE  1  I)  F  ELEMENT  A17)  &  (AT  A17  A27)) 

((LE  I  N)  IF  ELEMENT  A17)  &  (AT  A17  A27)) 

(FOLLOW  (DB.ETE  NIL  A 17))  IF  ELEMENT  A17)  &  (AT  A17  A27)) 

(OTEM-OF  (A0077  A51  A 17) 

NIL 

(SEQUENCE  I  ELUS  I  1)  ELLESIS  (SUB  N  1))) 

IF  (BEMEIMT  A51)  &  OTEM-OF  A59  NIL  (SEQUENCE  ELUS  I  1)  ELLESIS  N))  & 

ODENTIFED-BY  NIL  A51  A59)  &  (NUMBER  A59)  &  ELEMENT  A17)  &  (AT  A17  A27)) 
((CAUSE  (DELETE  NIL  A 17) 

(COME-ABOUT  (AND  0DENTIFIB3-BY  NIL  A51  (A0077  A51  A 17)) 

(NUMBER  (A0077  A51  A 17))))) 

F  ELEMENT  A51)  &  (ITBVI-OF  A59  NIL  (SEQUENCE  (PLUS  I  1)  BLIPSIS  N)) 

ODENTIFIED-BY  NIL  A51  A59)  &  (NUMBER  A59)  &  (BBYEt^n*  A1 7)  &  (AT  A 17  A27))) 
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